Simplify expr = L1 AND expr != L2 to expr = L1 when L1 != L2#19731
Simplify expr = L1 AND expr != L2 to expr = L1 when L1 != L2#19731alamb merged 3 commits intoapache:mainfrom
expr = L1 AND expr != L2 to expr = L1 when L1 != L2#19731Conversation
7fd7cde to
8375534
Compare
8375534 to
fdb54b8
Compare
xudong963
left a comment
There was a problem hiding this comment.
A clever simplify! It could reduce the CPU cost for FilterExec and row filter in parquet
| WHERE ((dept_name != 'Engineering' AND e.name = 'Alice') OR (name != 'Alice' AND e.name = 'Carol')); | ||
| ---- | ||
| logical_plan | ||
| 01)Filter: d.dept_name != Utf8View("Engineering") AND e.name = Utf8View("Alice") OR e.name != Utf8View("Alice") AND e.name = Utf8View("Carol") |
There was a problem hiding this comment.
is this right? It looks like it has rewritten
((dept_name != 'Engineering' AND e.name = 'Alice') OR (name != 'Alice' AND e.name = 'Carol'));to
((dept_name != 'Engineering' AND e.name = 'Alice') OR (e.name = 'Carol'));But name and e.name are different 🤔
There was a problem hiding this comment.
Update -- looking at this, it seems like actually the predicate actually did resolve both name references to the same thing (name@1) in the physical plan
01)FilterExec: dept_name@2 != Engineering AND name@1 = Alice OR name@1 != Alice AND name@1 = Carol
Maybe we can update this test (in a follow on PR) so it uses the fully qualified column references to make it clearer the intent
((d.dept_name != 'Engineering' AND e.name = 'Alice') OR (e.name != 'Alice' AND e.name = 'Carol'));
alamb
left a comment
There was a problem hiding this comment.
Thanks @simonvandel and @xudong963
| WHERE ((dept_name != 'Engineering' AND e.name = 'Alice') OR (name != 'Alice' AND e.name = 'Carol')); | ||
| ---- | ||
| logical_plan | ||
| 01)Filter: d.dept_name != Utf8View("Engineering") AND e.name = Utf8View("Alice") OR e.name != Utf8View("Alice") AND e.name = Utf8View("Carol") |
There was a problem hiding this comment.
Update -- looking at this, it seems like actually the predicate actually did resolve both name references to the same thing (name@1) in the physical plan
01)FilterExec: dept_name@2 != Engineering AND name@1 = Alice OR name@1 != Alice AND name@1 = Carol
Maybe we can update this test (in a follow on PR) so it uses the fully qualified column references to make it clearer the intent
((d.dept_name != 'Engineering' AND e.name = 'Alice') OR (e.name != 'Alice' AND e.name = 'Carol'));
Which issue does this PR close?
Rationale for this change
Add a simplifier rule to remove a comparison.
It's probably unlikely that a human would write such an expression, but a generated query, or other optimizations, may lead to such an expression.
What changes are included in this PR?
Simplify
expr = L1 AND expr != L2toexpr = L1whenL1 != L2Are these changes tested?
Added unit tests and SLT.
The first commit shows the plan before the optimization.
Are there any user-facing changes?
Fewer runtime-ops if the plan contains the pattern.